model: [WIP] explicit quantized layer #268

rebel-jongho · 2025-09-01T07:57:14Z

Pull Request Description

Type of Change

New Model Support
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
Documentation update
Performance improvement
Code refactoring
Other (please describe):

Changes Overview

Motivation and Context

Checklist

I have performed a self-review of my own code
I have added tests that prove my fix is effective or that my feature works (If needed)

Additional Information

Related Issues

Conventional commit

type(optional scope): description

Type candidate

Model Updates
- model: Adding New models or Bugfix for existing models
  - ex) Add LlavaNext
  - ex) Bugfix Whisper
Enhancements
- performance: Optimizing some models or this library itself
  - ex) Loading RBLNModel faster
  - ex) Optimizing Memory Usage of DecoderOnlyModel
Code Refactor
- refactor: Re-arrange class architecture, or more.
  - ex) Refactor Seq2Seq
Documentation
- doc: Update docstring only
Library Dependencies
- dependency: Update requirements, something like that.
Other
- other: None of above.
  - ex) ci update
  - ex) pdm update

Copilot

Pull Request Overview

This PR refactors the quantization layer creation system to use explicit quantized layer classes instead of dynamically modifying linear layers. The changes introduce separate QIntLinear and QFloatLinear classes with their own forward methods, replacing the previous approach of monkey-patching forward methods onto existing layers.

Key changes:

Introduction of explicit quantized layer classes (QLinear, QIntLinear, QFloatLinear)
Refactored layer creation methods to use these new classes instead of dynamic modification
Updated parameter handling to support proper data type management for scales

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File	Description
src/optimum/rbln/transformers/utils/rbln_quantization.py	Refactored quantization logic to use explicit layer classes and improved scale parameter handling
src/optimum/rbln/transformers/utils/qlinear.py	Added new quantized linear layer classes with explicit forward implementations

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/optimum/rbln/transformers/utils/qlinear.py

rebel-jongho added 6 commits August 13, 2025 17:04

initial run

d5a1638

a8

c907101

merge main

f9af79b

fix dtype

afe4986

remove n_layer if none

084aa93

ruff

1784ef6

rebel-jongho requested a review from Copilot September 8, 2025 06:39

Copilot AI reviewed Sep 8, 2025

View reviewed changes

rebel-jongho added 3 commits October 28, 2025 14:32

Merge remote-tracking branch 'origin/main' into model/quantize_layer

e0c75b6

add dynamic

582785e

add clamp for numerical stability

0fc9f44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

model: [WIP] explicit quantized layer #268

model: [WIP] explicit quantized layer #268

rebel-jongho commented Sep 1, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

model: [WIP] explicit quantized layer #268

Are you sure you want to change the base?

model: [WIP] explicit quantized layer #268

Conversation

rebel-jongho commented Sep 1, 2025

Pull Request Description

Type of Change

Changes Overview

Motivation and Context

Checklist

Additional Information

Related Issues

Conventional commit

Type candidate

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants